Skip to content

fix(auth): atomic locked credential writes, safe usage-queue flush, MCP refresh recovery#83

Merged
Aayam Bansal (aayambansal) merged 1 commit into
mainfrom
fix/credential-store-atomicity
Jul 5, 2026
Merged

fix(auth): atomic locked credential writes, safe usage-queue flush, MCP refresh recovery#83
Aayam Bansal (aayambansal) merged 1 commit into
mainfrom
fix/credential-store-atomicity

Conversation

@aayambansal

Copy link
Copy Markdown
Member

Credential stores: atomic writes, locking, corrupt-file safety

Auth.set/Auth.remove (and the McpAuth equivalents) were unlocked read-modify-write cycles over a truncate-then-write Bun.write, and a file that existed but did not parse silently read as {}. The combination meant a concurrent writer could drop a provider, and a torn read or disk-full truncation caused the next routine token refresh to rewrite auth.json containing only itself — permanently destroying every other credential.

Both stores now route through a shared util/jsonstore.ts:

  • Writes go to a temp file in the same directory followed by a rename, so a crash or concurrent reader never observes a torn file.
  • Every read-modify-write cycle is serialized behind the repo's existing Lock (the same util BunProc.install uses), keyed by file path. The McpAuth.update* helpers each became a single atomic cycle instead of a racy getset pair.
  • On the write path, a store that exists but cannot be parsed is fatal: the corrupt file is backed up alongside as <file>.corrupt-<pid> and the write throws with a clear message instead of proceeding with {}. The read path (all()) still degrades to {} so the CLI boots.

Fixes #54

Usage queue: locked flush, mid-flush appends survive

flushPendingUsage now holds the queue lock across its whole read → send → rewrite cycle (and persistToQueue takes the same lock), so concurrent flushes in one process can't double-send. The final rewrite re-reads the file and removes only the lines actually read — counted, so a report queued twice is removed exactly twice — meaning an append that lands mid-flush from another process is preserved instead of deleted. Flush remains best-effort and never throws at startup.

Fixes #58

MCP OAuth: refresh-token rotation recovery (the #20 pattern)

McpOAuthProvider now applies the codex recovery pattern to token refresh: when the persisted access token is expired, tokens() refreshes it through the SDK's discovery + refreshAuthorization helpers with a single-flight guard per server, so concurrent callers share one round-trip instead of racing a rotating refresh token. On refresh failure it re-reads the persisted tokens — if another process won the race, its still-valid access token is used directly, and a rotated refresh token is retried once — before falling back to the SDK's normal re-auth path. Successful refreshes persist through the now-locked, atomic McpAuth store, closing the cross-process persistence race underneath.

Fixes #59

Tests

  • test/auth/auth.test.ts: concurrent Auth.set calls lose no provider; a corrupt auth.json makes set/remove throw, leaves the original untouched, and creates the backup; unknown entries survive writes; no temp files left behind.
  • test/mcp/auth.test.ts: concurrent McpAuth updates keep every server entry; corrupt-store write refusal.
  • test/mcp/oauth-refresh.test.ts: against a real loopback OAuth server — concurrent expired-token callers trigger exactly one refresh; a rejected refresh recovers via the rotated token another process persisted; a still-valid access token persisted by the winner is reused without a second refresh.
  • test/openscience-usage.test.ts: flush leaves the queue intact when unauthenticated; malformed-only queues are dropped and unlinked.

bun test (846 pass) and bun run typecheck are green.

…CP refresh recovery

auth.json and mcp-auth.json writes were unlocked read-modify-write cycles
over a truncate-then-write Bun.write, so concurrent writers could drop
providers and a corrupt file silently read as {} and got rewritten
containing only the entry being saved. Route both stores through a shared
JsonStore util: temp-file-plus-rename writes, read-modify-write serialized
behind the in-process Lock, and a corrupt store is now fatal on the write
path (backed up alongside as *.corrupt-<pid>) while the read path still
degrades to {} so the CLI boots.

usage-queue.jsonl flush now holds the same Lock across its read, send, and
rewrite cycle and rewrites the file excluding only the lines it actually
read, so an append landing mid-flush is preserved instead of deleted.
Flush stays best-effort and never throws at startup.

MCP OAuth token refresh gets the codex recovery pattern: refreshes are
single-flight per server, and a failed refresh re-reads the persisted
tokens and retries once with a rotated refresh token another process
persisted before surfacing re-auth.
@vercel

vercel Bot commented Jul 5, 2026

Copy link
Copy Markdown

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
openscience Ready Ready Preview, Comment Jul 5, 2026 11:48am

Request Review

@aayambansal Aayam Bansal (aayambansal) merged commit 81757b5 into main Jul 5, 2026
11 checks passed
@aayambansal Aayam Bansal (aayambansal) deleted the fix/credential-store-atomicity branch July 5, 2026 11:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant